Retail Card Portfolio: Fraud Intelligence & Transaction Risk Report

Author

Freda Erinmwingbovo, Data Analyst

Published

February 23, 2026

CONFIDENTIAL. INTERNAL USE ONLY. NORTHBRIDGE BANK RISK INTELLIGENCE DIVISION

NorthBridge Bank: Retail Card Portfolio Risk Report

Prepared by: Freda Erinmwingbovo, Data Analyst
Prepared for: James Okonkwo, Chief Executive Officer, NorthBridge Bank
Classification: Confidential, Internal Use Only
Date: February 2026
Dataset: 1,296,675 transactions · 51 states · January 2019 to June 2020


Executive Summary

NorthBridge Bank’s retail card portfolio processed 1,296,675 transactions across 18 months of trading between January 2019 and June 2020. This report delivers a structured intelligence assessment of transaction behaviour, fraud patterns, and customer value across the portfolio.

Seven analytical questions were investigated:

# Business Question Classification
1 Transaction volume and revenue by category Portfolio Intelligence
2 Fraud rates by merchant and category Fraud Risk
3 Fraud profile by gender and age group Demographic Risk
4 Transaction amounts associated with fraud Amount Risk
5 Geographic fraud concentration Geographic Risk
6 Temporal fraud patterns Behavioural Risk
7 Highest-value customer profiles Customer Intelligence

The overall portfolio fraud rate stands at 0.58%, equivalent to 7,506 fraudulent transactions representing material financial exposure that demands immediate operational response.


Data Infrastructure

Library Imports and Database Construction

Show Query
# ── LIBRARY IMPORTS ──
import pandas as pd
import numpy as np
import sqlite3
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.patches as mpatches
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings('ignore')

print("All libraries imported successfully.")
print(f"pandas:     {pd.__version__}")
print(f"numpy:      {np.__version__}")
print(f"matplotlib: {matplotlib.__version__}")
print(f"sqlite3:    {sqlite3.sqlite_version}")
All libraries imported successfully.
pandas:     2.3.3
numpy:      2.4.2
matplotlib: 3.10.8
sqlite3:    3.50.4

Data Loading and Cleaning

Show Query
# ── LOAD RAW DATA ──
df = pd.read_csv('credit_card_transactions.csv')

print(f"Raw dataset loaded: {df.shape[0]:,} rows × {df.shape[1]} columns")

# ── CLEANING ──

# Drop unnamed index column and merch_zipcode (195,973 missing)
df = df.drop(columns=['Unnamed: 0', 'merch_zipcode'])

# Parse datetime
df['trans_date_trans_time'] = pd.to_datetime(df['trans_date_trans_time'])
df['dob'] = pd.to_datetime(df['dob'])

# Derived columns
df['trans_date']  = df['trans_date_trans_time'].dt.date
df['trans_year']  = df['trans_date_trans_time'].dt.year
df['trans_month'] = df['trans_date_trans_time'].dt.month
df['trans_hour']  = df['trans_date_trans_time'].dt.hour
df['trans_dow']   = df['trans_date_trans_time'].dt.day_name()
df['YearMonth']   = df['trans_date_trans_time'].dt.strftime('%Y-%m')

# Age at time of transaction
df['age'] = ((df['trans_date_trans_time'] - df['dob']).dt.days / 365.25).astype(int)

# Age bands
def age_band(age):
    if age < 25:   return 'Under 25'
    elif age < 35: return '25 to 34'
    elif age < 45: return '35 to 44'
    elif age < 55: return '45 to 54'
    elif age < 65: return '55 to 64'
    else:          return '65 and over'

df['age_band'] = df['age'].apply(age_band)

# Amount bands
def amt_band(amt):
    if amt < 10:    return 'Under $10'
    elif amt < 50:  return '$10 to $50'
    elif amt < 100: return '$50 to $100'
    elif amt < 500: return '$100 to $500'
    elif amt < 1000: return '$500 to $1000'
    else:           return 'Over $1000'

df['amt_band'] = df['amt'].apply(amt_band)

# Cardholder full name
df['cardholder'] = df['first'] + ' ' + df['last']

print(f"\nCleaned dataset: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"Date range: {df['trans_date_trans_time'].min().date()} to "
      f"{df['trans_date_trans_time'].max().date()}")
print(f"Columns: {list(df.columns)}")
Raw dataset loaded: 1,296,675 rows × 24 columns

Cleaned dataset: 1,296,675 rows × 32 columns
Date range: 2019-01-01 to 2020-06-21
Columns: ['trans_date_trans_time', 'cc_num', 'merchant', 'category', 'amt', 'first', 'last', 'gender', 'street', 'city', 'state', 'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'trans_num', 'unix_time', 'merch_lat', 'merch_long', 'is_fraud', 'trans_date', 'trans_year', 'trans_month', 'trans_hour', 'trans_dow', 'YearMonth', 'age', 'age_band', 'amt_band', 'cardholder']

SQLite Database Construction

Show Query
# ── BUILD SQLITE DATABASE ──
conn = sqlite3.connect(':memory:')

df.to_sql('transactions', conn, if_exists='replace', index=False)

print("SQLite database constructed in memory.")
print(f"Table: transactions")

# Verify
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM transactions")
count = cursor.fetchone()[0]
print(f"Row count confirmed: {count:,}")

cursor.execute("PRAGMA table_info(transactions)")
cols = cursor.fetchall()
print(f"\nSchema — {len(cols)} columns:")
for col in cols:
    print(f"  {col[1]:<30} {col[2]}")
SQLite database constructed in memory.
Table: transactions
Row count confirmed: 1,296,675

Schema — 32 columns:
  trans_date_trans_time          TIMESTAMP
  cc_num                         INTEGER
  merchant                       TEXT
  category                       TEXT
  amt                            REAL
  first                          TEXT
  last                           TEXT
  gender                         TEXT
  street                         TEXT
  city                           TEXT
  state                          TEXT
  zip                            INTEGER
  lat                            REAL
  long                           REAL
  city_pop                       INTEGER
  job                            TEXT
  dob                            TIMESTAMP
  trans_num                      TEXT
  unix_time                      INTEGER
  merch_lat                      REAL
  merch_long                     REAL
  is_fraud                       INTEGER
  trans_date                     DATE
  trans_year                     INTEGER
  trans_month                    INTEGER
  trans_hour                     INTEGER
  trans_dow                      TEXT
  YearMonth                      TEXT
  age                            INTEGER
  age_band                       TEXT
  amt_band                       TEXT
  cardholder                     TEXT

Portfolio Overview

Show Query
overview = pd.read_sql_query("""
    SELECT
        COUNT(*)                                    AS total_transactions,
        COUNT(DISTINCT cc_num)                      AS unique_cards,
        COUNT(DISTINCT merchant)                    AS unique_merchants,
        COUNT(DISTINCT state)                       AS states_covered,
        COUNT(DISTINCT category)                    AS categories,
        ROUND(SUM(amt), 2)                          AS total_volume_usd,
        ROUND(AVG(amt), 2)                          AS avg_transaction_usd,
        ROUND(MIN(amt), 2)                          AS min_transaction_usd,
        ROUND(MAX(amt), 2)                          AS max_transaction_usd,
        SUM(is_fraud)                               AS total_fraud_count,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)              AS total_fraud_value_usd,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct
    FROM transactions
""", conn)

display(overview.T.rename(columns={0: 'Value'}))
Value
total_transactions 1.296675e+06
unique_cards 9.830000e+02
unique_merchants 6.930000e+02
states_covered 5.100000e+01
categories 1.400000e+01
total_volume_usd 9.122243e+07
avg_transaction_usd 7.035000e+01
min_transaction_usd 1.000000e+00
max_transaction_usd 2.894890e+04
total_fraud_count 7.506000e+03
total_fraud_value_usd 3.988089e+06
fraud_rate_pct 5.789000e-01

Analysis

Business Question

What is NorthBridge Bank’s total transaction volume, revenue distribution, and spending composition across all card categories?

Methodology

All 1,296,675 transactions are aggregated by spending category. Volume, total value, average transaction size, and fraud exposure are computed for each category to establish the portfolio’s commercial baseline.

Dataset: Full transactions table — 1,296,675 records

Category Volume and Revenue

Show Query
category_analysis = pd.read_sql_query("""
    SELECT
        category,
        COUNT(*)                                        AS transaction_count,
        ROUND(SUM(amt), 2)                             AS total_revenue_usd,
        ROUND(AVG(amt), 2)                             AS avg_transaction_usd,
        ROUND(MIN(amt), 2)                             AS min_usd,
        ROUND(MAX(amt), 2)                             AS max_usd,
        SUM(is_fraud)                                  AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)    AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                 AS fraud_value_usd
    FROM transactions
    GROUP BY category
    ORDER BY total_revenue_usd DESC
""", conn)

display(category_analysis)
category transaction_count total_revenue_usd avg_transaction_usd min_usd max_usd fraud_count fraud_rate_pct fraud_value_usd
0 grocery_pos 123638 14460822.38 116.96 10.87 397.97 1743 1.4098 543797.90
1 shopping_pos 116672 9307993.61 79.78 1.00 9754.72 843 0.7225 739245.09
2 shopping_net 97543 8625149.68 88.42 1.00 9555.26 1713 1.7561 1711723.71
3 gas_transport 131659 8351732.29 63.43 5.93 154.03 618 0.4694 7594.11
4 home 123115 7173928.11 58.27 1.00 560.81 198 0.1608 50971.66
5 kids_pets 113035 6503680.16 57.54 1.00 586.34 239 0.2114 4331.08
6 entertainment 94014 6036678.56 64.21 1.00 934.47 233 0.2478 117323.79
7 misc_net 63287 5117709.26 80.87 1.00 4084.34 915 1.4458 729266.76
8 misc_pos 79655 5009582.50 62.89 1.00 2703.62 250 0.3139 54571.02
9 food_dining 91461 4672459.44 51.09 1.00 769.26 151 0.1651 18131.62
10 health_fitness 85879 4653108.02 54.18 1.00 598.38 133 0.1549 2693.04
11 travel 40507 4516721.68 111.50 1.00 28948.90 116 0.2864 1051.49
12 personal_care 90758 4353450.53 47.97 1.00 527.03 220 0.2424 5757.52
13 grocery_net 45452 2439412.68 53.67 1.44 185.81 134 0.2948 1629.82

Transaction Intelligence Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Transaction Volume and Revenue by Category',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

gold   = '#c9a84c'
muted  = '#7a7a8a'
white  = '#f0ede8'
surface = '#18181c'
border  = '#2a2a30'

cats    = category_analysis['category'].tolist()
revenue = category_analysis['total_revenue_usd'].tolist()
counts  = category_analysis['transaction_count'].tolist()

bar_colors = [gold if i == 0 else '#3a3a42' for i in range(len(cats))]

# Plot 1: Revenue by category
ax1 = axes[0]
ax1.set_facecolor(surface)
bars = ax1.barh(range(len(cats)), revenue,
                color=bar_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(cats)))
ax1.set_yticklabels(cats, color=white, fontsize=9,
                    fontfamily='monospace')
ax1.set_xlabel('Total Revenue (USD)', color=muted, fontsize=9)
ax1.set_title('Total Revenue by Category',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
    mticker.FuncFormatter(lambda x, _: f'${x/1e6:.1f}M')
)
for bar, val in zip(bars, revenue):
    ax1.text(bar.get_width() + 50000,
             bar.get_y() + bar.get_height()/2,
             f'${val/1e6:.1f}M',
             va='center', color=white, fontsize=7.5,
             fontfamily='monospace')

# Plot 2: Transaction count by category
ax2 = axes[1]
ax2.set_facecolor(surface)
bars2 = ax2.barh(range(len(cats)), counts,
                 color='#3a3a42', edgecolor='none', height=0.65)
bars2[0].set_color(gold)
ax2.set_yticks(range(len(cats)))
ax2.set_yticklabels(cats, color=white, fontsize=9,
                    fontfamily='monospace')
ax2.set_xlabel('Number of Transactions', color=muted, fontsize=9)
ax2.set_title('Transaction Count by Category',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax2.invert_yaxis()
ax2.xaxis.set_major_formatter(
    mticker.FuncFormatter(lambda x, _: f'{x/1000:.0f}K')
)

plt.tight_layout()
plt.show()

Transaction Intelligence Findings

Gas and transport leads all categories in transaction volume with 131,659 transactions. The portfolio’s revenue is broadly distributed across 14 spending categories, reflecting a diversified cardholder base. No single category represents a dangerous concentration of revenue exposure.

Key observations:

  • Gas and transport dominates by transaction count, consistent with high-frequency, lower-value daily spending behaviour
  • Entertainment, personal care, and food and dining represent lifestyle spend with higher average transaction values
  • The portfolio’s broad category distribution reduces concentration risk at the revenue level

Business Question

Which spending categories and individual merchants carry the highest fraud rates, and what is the total financial exposure?

Methodology

Fraud rate is calculated as fraudulent transactions as a percentage of total transactions per category and per merchant. Only merchants with more than 100 transactions are included to ensure statistical reliability. Results are ranked by fraud rate and total fraud value.

Fraud by Category

Show Query
fraud_category = pd.read_sql_query("""
    SELECT
        category,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd,
        ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd
    FROM transactions
    GROUP BY category
    ORDER BY fraud_rate_pct DESC
""", conn)

display(fraud_category)
category total_transactions fraud_count fraud_rate_pct fraud_value_usd avg_fraud_amt_usd
0 shopping_net 97543 1713 1.7561 1711723.71 999.25
1 misc_net 63287 915 1.4458 729266.76 797.01
2 grocery_pos 123638 1743 1.4098 543797.90 311.99
3 shopping_pos 116672 843 0.7225 739245.09 876.92
4 gas_transport 131659 618 0.4694 7594.11 12.29
5 misc_pos 79655 250 0.3139 54571.02 218.28
6 grocery_net 45452 134 0.2948 1629.82 12.16
7 travel 40507 116 0.2864 1051.49 9.06
8 entertainment 94014 233 0.2478 117323.79 503.54
9 personal_care 90758 220 0.2424 5757.52 26.17
10 kids_pets 113035 239 0.2114 4331.08 18.12
11 food_dining 91461 151 0.1651 18131.62 120.08
12 home 123115 198 0.1608 50971.66 257.43
13 health_fitness 85879 133 0.1549 2693.04 20.25

Top 20 Highest-Risk Merchants

Show Query
fraud_merchant = pd.read_sql_query("""
    SELECT
        merchant,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd
    FROM transactions
    GROUP BY merchant
    HAVING COUNT(*) > 100
    ORDER BY fraud_rate_pct DESC
    LIMIT 20
""", conn)

display(fraud_merchant)
merchant total_transactions fraud_count fraud_rate_pct fraud_value_usd
0 fraud_Kozey-Boehm 1866 48 2.5723 48189.98
1 fraud_Herman, Treutel and Dickens 1300 33 2.5385 26069.11
2 fraud_Kerluke-Abshire 1838 41 2.2307 40909.57
3 fraud_Brown PLC 1176 26 2.2109 21170.77
4 fraud_Goyette Inc 1943 42 2.1616 41580.84
5 fraud_Terry-Huel 1996 43 2.1543 42356.37
6 fraud_Jast Ltd 1953 42 2.1505 42560.34
7 fraud_Schmeler, Bashirian and Price 1968 41 2.0833 40143.05
8 fraud_Boyer-Reichert 1908 38 1.9916 38513.02
9 fraud_Langworth, Boehm and Gulgowski 1969 39 1.9807 38564.36
10 fraud_Moore, Dibbert and Koepp 1266 25 1.9747 19488.50
11 fraud_Baumbach, Feeney and Morar 1929 38 1.9699 37293.87
12 fraud_Rau and Sons 2490 49 1.9679 15299.76
13 fraud_Gleason-Macejkovic 2033 40 1.9675 39892.84
14 fraud_Kuhic LLC 1985 39 1.9647 39765.06
15 fraud_Kuhic, Bins and Pfeffer 2003 39 1.9471 39865.69
16 fraud_Price Inc 1983 38 1.9163 38165.54
17 fraud_Tillman, Fritsch and Schmitt 1307 25 1.9128 19800.35
18 fraud_Vandervort-Funk 2474 47 1.8998 14973.76
19 fraud_Durgan-Auer 1283 24 1.8706 19051.86

Fraud Risk Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Fraud Risk by Category and Merchant',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

red    = '#c0392b'
gold   = '#c9a84c'

# Plot 1: Fraud rate by category
fc = fraud_category.copy()
ax1 = axes[0]
ax1.set_facecolor(surface)
colors1 = [red if r > 1.0 else '#3a3a42'
           for r in fc['fraud_rate_pct']]
bars = ax1.barh(range(len(fc)), fc['fraud_rate_pct'],
                color=colors1, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(fc)))
ax1.set_yticklabels(fc['category'], color=white,
                    fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Fraud Rate (%)', color=muted, fontsize=9)
ax1.set_title('Fraud Rate by Category',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
for bar, val in zip(bars, fc['fraud_rate_pct']):
    ax1.text(bar.get_width() + 0.01,
             bar.get_y() + bar.get_height()/2,
             f'{val:.2f}%',
             va='center', color=white, fontsize=7.5,
             fontfamily='monospace')

# Plot 2: Fraud value by category
ax2 = axes[1]
ax2.set_facecolor(surface)
fc_sorted = fc.sort_values('fraud_value_usd', ascending=True)
bars2 = ax2.barh(range(len(fc_sorted)),
                 fc_sorted['fraud_value_usd'],
                 color=gold, edgecolor='none', height=0.65)
ax2.set_yticks(range(len(fc_sorted)))
ax2.set_yticklabels(fc_sorted['category'], color=white,
                    fontsize=9, fontfamily='monospace')
ax2.set_xlabel('Total Fraud Value (USD)', color=muted, fontsize=9)
ax2.set_title('Total Fraud Value by Category',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax2.xaxis.set_major_formatter(
    mticker.FuncFormatter(lambda x, _: f'${x/1000:.0f}K')
)

plt.tight_layout()
plt.show()

Fraud Risk Findings and Recommendations

Categories with elevated fraud rates represent priority targets for transaction monitoring rule deployment.

Recommendations:

  • Implement real-time merchant-level fraud scoring for the top 20 high-risk merchants identified above
  • Apply step-up authentication for transactions in high-fraud-rate categories above 1.0%
  • Review merchant agreements for flagged merchants and conduct due diligence on their fraud prevention controls

Business Question

How does fraud exposure vary by cardholder gender and age group? Are certain demographic segments materially more exposed?

Methodology

Fraud rate and fraud value are computed by gender and by age band. Age bands are constructed in 10-year cohorts from the cardholder date of birth relative to the transaction date.

Fraud by Gender

Show Query
fraud_gender = pd.read_sql_query("""
    SELECT
        gender,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd,
        ROUND(AVG(amt), 2)                              AS avg_transaction_usd
    FROM transactions
    GROUP BY gender
    ORDER BY fraud_rate_pct DESC
""", conn)

display(fraud_gender)
gender total_transactions fraud_count fraud_rate_pct fraud_value_usd avg_transaction_usd
0 M 586812 3771 0.6426 2142801.27 70.53
1 F 709863 3735 0.5262 1845287.34 70.20

Fraud by Age Group

Show Query
fraud_age = pd.read_sql_query("""
    SELECT
        age_band,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd,
        ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd
    FROM transactions
    GROUP BY age_band
    ORDER BY fraud_rate_pct DESC
""", conn)

display(fraud_age)
age_band total_transactions fraud_count fraud_rate_pct fraud_value_usd avg_fraud_amt_usd
0 55 to 64 164087 1259 0.7673 700888.74 556.70
1 65 and over 193658 1438 0.7425 821277.02 571.12
2 Under 25 121688 764 0.6278 444651.24 582.00
3 45 to 54 256553 1491 0.5812 769550.60 516.13
4 25 to 34 287749 1391 0.4834 692212.63 497.64
5 35 to 44 272940 1163 0.4261 559508.38 481.09

Demographic Risk Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Fraud Risk by Gender and Age Group',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

# Plot 1: Gender fraud rate
ax1 = axes[0]
ax1.set_facecolor(surface)
genders = fraud_gender['gender'].tolist()
g_rates = fraud_gender['fraud_rate_pct'].tolist()
g_colors = [red, gold]
bars = ax1.bar(genders, g_rates, color=g_colors,
               edgecolor='none', width=0.5)
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Gender',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.set_facecolor(surface)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars, g_rates):
    ax1.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.005,
             f'{val:.3f}%',
             ha='center', color=white, fontsize=10,
             fontfamily='monospace')

# Plot 2: Age band fraud rate
ax2 = axes[1]
ax2.set_facecolor(surface)

age_order = ['Under 25', '25 to 34', '35 to 44',
             '45 to 54', '55 to 64', '65 and over']
fraud_age_sorted = fraud_age.set_index('age_band').reindex(age_order).reset_index()

a_colors = [red if r == fraud_age_sorted['fraud_rate_pct'].max()
            else '#3a3a42'
            for r in fraud_age_sorted['fraud_rate_pct']]

bars2 = ax2.bar(range(len(age_order)),
                fraud_age_sorted['fraud_rate_pct'],
                color=a_colors, edgecolor='none', width=0.6)
ax2.set_xticks(range(len(age_order)))
ax2.set_xticklabels(age_order, rotation=30, ha='right',
                    color=white, fontsize=8,
                    fontfamily='monospace')
ax2.set_ylabel('Fraud Rate (%)', color=muted)
ax2.set_title('Fraud Rate by Age Group',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, fraud_age_sorted['fraud_rate_pct']):
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.005,
             f'{val:.3f}%',
             ha='center', color=white, fontsize=8,
             fontfamily='monospace')

plt.tight_layout()
plt.show()

Demographic Risk Findings and Recommendations

Recommendations:

  • Target enhanced fraud monitoring on the highest-risk age cohort identified above
  • Consider age-based step-up authentication thresholds for transactions above the cohort average amount
  • Review gender-based fraud patterns for potential card targeting schemes

Business Question

What transaction amount ranges are most associated with fraudulent activity? Is there a statistically significant amount profile for fraudulent transactions?

Methodology

Transactions are grouped into six amount bands. Fraud rate and average fraud amount are computed per band. A direct comparison of average fraudulent versus legitimate transaction amounts is also produced.

Fraud Rate by Amount Band

Show Query
fraud_amount = pd.read_sql_query("""
    SELECT
        amt_band,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(AVG(amt), 2)                              AS avg_amt_usd,
        ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS total_fraud_value_usd
    FROM transactions
    GROUP BY amt_band
    ORDER BY avg_amt_usd
""", conn)

display(fraud_amount)
amt_band total_transactions fraud_count fraud_rate_pct avg_amt_usd avg_fraud_amt_usd total_fraud_value_usd
0 Under $10 335718 525 0.1564 5.26 8.09 4246.51
1 $10 to $50 336496 1082 0.3215 30.09 18.51 20027.14
2 $50 to $100 389514 45 0.0116 71.67 57.15 2571.88
3 $100 to $500 219316 2206 1.0059 165.71 301.74 665643.96
4 $500 to $1000 11694 2698 23.0717 693.61 840.76 2268360.55
5 Over $1000 3937 950 24.1300 1768.17 1081.30 1027238.57

Legitimate vs Fraudulent Transaction Amounts

Show Query
amt_comparison = pd.read_sql_query("""
    SELECT
        CASE WHEN is_fraud = 1 THEN 'Fraudulent' ELSE 'Legitimate' END AS transaction_type,
        COUNT(*)                    AS transaction_count,
        ROUND(AVG(amt), 2)          AS avg_amount_usd,
        ROUND(MIN(amt), 2)          AS min_amount_usd,
        ROUND(MAX(amt), 2)          AS max_amount_usd,
        ROUND(SUM(amt), 2)          AS total_value_usd
    FROM transactions
    GROUP BY is_fraud
    ORDER BY is_fraud DESC
""", conn)

display(amt_comparison)
transaction_type transaction_count avg_amount_usd min_amount_usd max_amount_usd total_value_usd
0 Fraudulent 7506 531.32 1.06 1376.04 3988088.61
1 Legitimate 1289169 67.67 1.00 28948.90 87234340.29

Amount Risk Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Transaction Amount Risk Profile',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

amt_order = ['Under $10', '$10 to $50', '$50 to $100',
             '$100 to $500', '$500 to $1000', 'Over $1000']
fraud_amt_sorted = fraud_amount.set_index('amt_band').reindex(amt_order).reset_index()

# Plot 1: Fraud rate by amount band
ax1 = axes[0]
ax1.set_facecolor(surface)
a_colors2 = [red if r == fraud_amt_sorted['fraud_rate_pct'].max()
             else '#3a3a42'
             for r in fraud_amt_sorted['fraud_rate_pct']]
bars = ax1.bar(range(len(amt_order)),
               fraud_amt_sorted['fraud_rate_pct'],
               color=a_colors2, edgecolor='none', width=0.65)
ax1.set_xticks(range(len(amt_order)))
ax1.set_xticklabels(amt_order, rotation=30, ha='right',
                    color=white, fontsize=8, fontfamily='monospace')
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Amount Band',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars, fraud_amt_sorted['fraud_rate_pct']):
    ax1.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.01,
             f'{val:.2f}%',
             ha='center', color=white, fontsize=8,
             fontfamily='monospace')

# Plot 2: Avg amount — fraud vs legitimate
ax2 = axes[1]
ax2.set_facecolor(surface)
types  = amt_comparison['transaction_type'].tolist()
avgs   = amt_comparison['avg_amount_usd'].tolist()
colors3 = [red, gold]
bars2 = ax2.bar(types, avgs, color=colors3,
                edgecolor='none', width=0.5)
ax2.set_ylabel('Average Transaction Amount (USD)', color=muted)
ax2.set_title('Average Amount: Fraud vs Legitimate',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, avgs):
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 1,
             f'${val:,.2f}',
             ha='center', color=white, fontsize=11,
             fontfamily='monospace', fontweight='bold')

plt.tight_layout()
plt.show()

Amount Risk Findings and Recommendations

Recommendations:

  • Implement amount-based fraud scoring triggers at the thresholds identified by the highest-risk amount bands
  • Flag transactions that significantly deviate from a cardholder’s historical average transaction amount
  • Apply mandatory secondary authentication for transactions in the Over $1,000 band

Business Question

Which US states and cities carry the highest concentration of fraudulent activity by count and by total value?

Methodology

Fraud count, fraud value, and fraud rate are aggregated by state and by city. States with fewer than 1,000 transactions are excluded to ensure statistical reliability. Top 15 results by fraud value are presented for both geographies.

Fraud by State

Show Query
fraud_state = pd.read_sql_query("""
    SELECT
        state,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd
    FROM transactions
    GROUP BY state
    HAVING COUNT(*) >= 1000
    ORDER BY fraud_value_usd DESC
    LIMIT 15
""", conn)

display(fraud_state)
state total_transactions fraud_count fraud_rate_pct fraud_value_usd
0 NY 83501 555 0.6647 295548.64
1 TX 94876 479 0.5049 265806.41
2 PA 79847 458 0.5736 244624.67
3 CA 56360 326 0.5784 170943.92
4 OH 46480 321 0.6906 168919.98
5 FL 42671 281 0.6585 150913.03
6 IL 43252 248 0.5734 128441.99
7 MI 46154 238 0.5157 121012.08
8 AL 40989 215 0.5245 112596.99
9 MN 31714 207 0.6527 112454.39
10 VA 29250 198 0.6769 101804.98
11 NE 24168 180 0.7448 100152.31
12 SC 29190 193 0.6612 95872.70
13 MO 38403 191 0.4974 93092.10
14 WI 29368 163 0.5550 90108.74

Fraud by City

Show Query
fraud_city = pd.read_sql_query("""
    SELECT
        city,
        state,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd
    FROM transactions
    GROUP BY city, state
    HAVING COUNT(*) >= 100
    ORDER BY fraud_value_usd DESC
    LIMIT 15
""", conn)

display(fraud_city)
city state total_transactions fraud_count fraud_rate_pct fraud_value_usd
0 Houston TX 4168 39 0.9357 21667.21
1 Dallas TX 3629 27 0.7440 19747.14
2 Tulsa OK 3086 27 0.8749 17470.25
3 Naples FL 4155 29 0.6980 17220.43
4 Huntsville AL 3560 29 0.8146 15142.79
5 San Antonio TX 5130 25 0.4873 14536.75
6 Lakeland FL 3635 21 0.5777 13211.61
7 New York City NY 2563 23 0.8974 13136.86
8 Topeka KS 3028 27 0.8917 12976.29
9 Albany NY 1056 20 1.8939 12850.38
10 Fenelton PA 2551 21 0.8232 12571.88
11 Aurora CO 512 23 4.4922 12298.80
12 Beaver Falls PA 1579 24 1.5199 12071.98
13 Hovland MN 989 19 1.9211 12011.59
14 Warren MI 3597 23 0.6394 11843.15

Geographic Risk Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Geographic Fraud Concentration',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

# Plot 1: Top states by fraud value
ax1 = axes[0]
ax1.set_facecolor(surface)
state_colors = [red if i == 0 else '#3a3a42'
                for i in range(len(fraud_state))]
bars = ax1.barh(range(len(fraud_state)),
                fraud_state['fraud_value_usd'],
                color=state_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(fraud_state)))
ax1.set_yticklabels(fraud_state['state'], color=white,
                    fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Total Fraud Value (USD)', color=muted)
ax1.set_title('Top 15 States by Fraud Value',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
    mticker.FuncFormatter(lambda x, _: f'${x/1000:.0f}K')
)

# Plot 2: Top states by fraud rate
fraud_state_rate = fraud_state.sort_values(
    'fraud_rate_pct', ascending=True
)
ax2 = axes[1]
ax2.set_facecolor(surface)
rate_colors = [red if i == len(fraud_state_rate)-1
               else '#3a3a42'
               for i in range(len(fraud_state_rate))]
bars2 = ax2.barh(range(len(fraud_state_rate)),
                 fraud_state_rate['fraud_rate_pct'],
                 color=rate_colors, edgecolor='none', height=0.65)
ax2.set_yticks(range(len(fraud_state_rate)))
ax2.set_yticklabels(fraud_state_rate['state'], color=white,
                    fontsize=9, fontfamily='monospace')
ax2.set_xlabel('Fraud Rate (%)', color=muted)
ax2.set_title('Top 15 States by Fraud Rate',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')

plt.tight_layout()
plt.show()

Geographic Risk Findings and Recommendations

Recommendations:

  • Implement state-level fraud velocity rules for the highest-risk jurisdictions identified above
  • Apply enhanced monitoring to cardholders who transact frequently in high-risk states outside their home state
  • Coordinate with partner banks in the highest-risk states to share fraud intelligence

Business Question

When does fraud occur? Are there peak hours, days of the week, or months where fraudulent activity concentrates?

Methodology

Fraud count and fraud rate are aggregated by hour of day, day of week, and month. This temporal profile enables targeted deployment of real-time monitoring rules during peak fraud windows.

Fraud by Hour of Day

Show Query
fraud_hour = pd.read_sql_query("""
    SELECT
        trans_hour                                      AS hour_of_day,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct
    FROM transactions
    GROUP BY trans_hour
    ORDER BY trans_hour
""", conn)

display(fraud_hour)
hour_of_day total_transactions fraud_count fraud_rate_pct
0 0 42502 635 1.4940
1 1 42869 658 1.5349
2 2 42656 625 1.4652
3 3 42769 609 1.4239
4 4 41863 46 0.1099
5 5 42171 60 0.1423
6 6 42300 40 0.0946
7 7 42203 56 0.1327
8 8 42505 49 0.1153
9 9 42185 47 0.1114
10 10 42271 40 0.0946
11 11 42082 42 0.0998
12 12 65257 67 0.1027
13 13 65314 80 0.1225
14 14 64885 86 0.1325
15 15 65391 79 0.1208
16 16 65726 76 0.1156
17 17 65450 78 0.1192
18 18 66051 81 0.1226
19 19 65508 81 0.1236
20 20 65098 62 0.0952
21 21 65533 74 0.1129
22 22 66982 1931 2.8829
23 23 67104 1904 2.8374

Fraud by Day of Week

Show Query
fraud_dow = pd.read_sql_query("""
    SELECT
        trans_dow                                       AS day_of_week,
        COUNT(*)                                        AS total_transactions,
        SUM(is_fraud)                                   AS fraud_count,
        ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4)     AS fraud_rate_pct,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                  AS fraud_value_usd
    FROM transactions
    GROUP BY trans_dow
    ORDER BY fraud_rate_pct DESC
""", conn)

display(fraud_dow)
day_of_week total_transactions fraud_count fraud_rate_pct fraud_value_usd
0 Friday 152272 1079 0.7086 581816.53
1 Thursday 147285 1008 0.6844 544490.00
2 Wednesday 131073 859 0.6554 449581.26
3 Saturday 200957 1227 0.6106 636539.35
4 Tuesday 160227 935 0.5835 501650.80
5 Sunday 250579 1216 0.4853 663864.01
6 Monday 254282 1182 0.4648 610146.66

Temporal Risk Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Temporal Fraud Patterns',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

# Plot 1: Fraud rate by hour
ax1 = axes[0]
ax1.set_facecolor(surface)
hours = fraud_hour['hour_of_day'].tolist()
rates = fraud_hour['fraud_rate_pct'].tolist()
h_colors = [red if r == max(rates) else '#3a3a42' for r in rates]
bars = ax1.bar(hours, rates, color=h_colors,
               edgecolor='none', width=0.8)
ax1.set_xlabel('Hour of Day', color=muted)
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Hour of Day',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
ax1.set_xticks(range(0, 24, 2))

# Overlay legitimate transaction volume
ax1b = ax1.twinx()
ax1b.plot(hours, fraud_hour['total_transactions'],
          color=gold, linewidth=1.5, alpha=0.6,
          linestyle='--', label='Transaction Volume')
ax1b.set_ylabel('Transaction Volume', color=gold, fontsize=8)
ax1b.tick_params(colors=gold)
ax1b.spines[:].set_color(border)

# Plot 2: Fraud rate by day of week
ax2 = axes[1]
ax2.set_facecolor(surface)
dow_order = ['Monday', 'Tuesday', 'Wednesday',
             'Thursday', 'Friday', 'Saturday', 'Sunday']
fraud_dow_sorted = fraud_dow.set_index('day_of_week').reindex(
    dow_order
).reset_index()
d_colors = [red if r == fraud_dow_sorted['fraud_rate_pct'].max()
            else '#3a3a42'
            for r in fraud_dow_sorted['fraud_rate_pct']]
bars2 = ax2.bar(range(len(dow_order)),
                fraud_dow_sorted['fraud_rate_pct'],
                color=d_colors, edgecolor='none', width=0.65)
ax2.set_xticks(range(len(dow_order)))
ax2.set_xticklabels([d[:3] for d in dow_order],
                    color=white, fontsize=9,
                    fontfamily='monospace')
ax2.set_ylabel('Fraud Rate (%)', color=muted)
ax2.set_title('Fraud Rate by Day of Week',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, fraud_dow_sorted['fraud_rate_pct']):
    ax2.text(bar.get_x() + bar.get_width()/2,
             bar.get_height() + 0.005,
             f'{val:.3f}%',
             ha='center', color=white, fontsize=8,
             fontfamily='monospace')

plt.tight_layout()
plt.show()

Temporal Risk Findings and Recommendations

Recommendations:

  • Deploy heightened real-time fraud rules during peak fraud hours identified above
  • Increase transaction monitoring staff coverage during highest-risk days of the week
  • Review overnight transaction approval thresholds for categories with elevated late-night fraud rates

Business Question

Who are NorthBridge Bank’s top 20 highest-value cardholders, and what does their transaction profile reveal about spending behaviour and fraud exposure?

Methodology

Cardholders are ranked by total legitimate transaction value. For each top customer, transaction count, average transaction amount, categories used, and fraud exposure are computed.

Top 20 Highest-Value Customers

Show Query
top_customers = pd.read_sql_query("""
    SELECT
        cardholder,
        gender,
        age_band,
        state,
        job,
        COUNT(*)                                        AS total_transactions,
        ROUND(SUM(amt), 2)                             AS total_spend_usd,
        ROUND(AVG(amt), 2)                             AS avg_transaction_usd,
        ROUND(MAX(amt), 2)                             AS max_transaction_usd,
        SUM(is_fraud)                                  AS fraud_incidents,
        ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
                       ELSE 0 END), 2)                 AS fraud_exposure_usd
    FROM transactions
    GROUP BY cardholder, gender, age_band, state, job
    ORDER BY total_spend_usd DESC
    LIMIT 20
""", conn)

display(top_customers)
cardholder gender age_band state job total_transactions total_spend_usd avg_transaction_usd max_transaction_usd fraud_incidents fraud_exposure_usd
0 Tammy Ayers F 25 to 34 OH Medical sales representative 3110 296436.73 95.32 8524.52 0 0.00
1 Lauren Torres F 25 to 34 TX Radiographer, therapeutic 3039 290478.49 95.58 12176.55 3 1124.46
2 Allison Allen F 25 to 34 AR Electrical engineer 3106 284013.50 91.44 1962.76 14 5773.61
3 Jessica Perez F 25 to 34 AZ Petroleum engineer 3119 280008.05 89.77 1469.33 7 2653.11
4 Erin Chavez F 35 to 44 CO Water engineer 3090 278325.97 90.07 8217.23 0 0.00
5 Rebecca Erickson F 35 to 44 NC English as a second language teacher 3101 278139.27 89.69 2976.41 2 280.15
6 Jenna Brooks F 35 to 44 LA Designer, furniture 3064 277085.65 90.43 7486.04 0 0.00
7 Susan Hardy F 35 to 44 MI Trade mark attorney 3027 275930.63 91.16 5160.64 4 855.73
8 Kristen Hanson F 25 to 34 MI Learning disability nurse 3033 274121.51 90.38 10469.09 8 5272.19
9 Joanna Hudson F 25 to 34 FL Environmental consultant 3027 273723.12 90.43 1510.15 14 6450.28
10 Kimberly Gonzalez F 35 to 44 AR Scientist, audiological 3081 273162.27 88.66 3729.41 8 883.71
11 Sharon Smith F 45 to 54 CA Designer, exhibition/display 3026 272866.42 90.17 2415.54 4 882.67
12 Kristina Stewart F 45 to 54 CA Health physicist 3034 272468.90 89.81 1679.89 3 1265.48
13 Angela Hodges F 35 to 44 IN Firefighter 3032 270698.48 89.28 1536.26 6 3960.27
14 Jodi Rodriguez F 35 to 44 AR Engineer, control and instrumentation 3072 269099.47 87.60 1707.05 16 8243.35
15 Margaret Mendez F 35 to 44 KS Chief Executive Officer 3082 260008.37 84.36 1804.27 5 1634.48
16 Michelle Johnston F 25 to 34 TX IT trainer 3038 258282.59 85.02 1461.89 12 2958.76
17 Jessica Ward F 35 to 44 AZ Contractor 3023 256768.11 84.94 1855.18 2 344.74
18 Sara Harris F 35 to 44 NY Chartered loss adjuster 3055 254989.34 83.47 1797.22 3 961.30
19 Julia Bell F 25 to 34 NY Film/video editor 2582 237516.52 91.99 2299.61 0 0.00

Top Customer Spending Categories

Show Query
top_names = top_customers['cardholder'].tolist()
placeholders = ','.join(['?' for _ in top_names])

top_cat_query = f"""
    SELECT
        cardholder,
        category,
        COUNT(*)            AS transactions,
        ROUND(SUM(amt), 2)  AS total_spend_usd
    FROM transactions
    WHERE cardholder IN ({placeholders})
    GROUP BY cardholder, category
    ORDER BY cardholder, total_spend_usd DESC
"""

top_cat = pd.read_sql_query(top_cat_query, conn, params=top_names)
display(top_cat.head(40))
cardholder category transactions total_spend_usd
0 Allison Allen grocery_pos 303 61387.65
1 Allison Allen shopping_pos 359 42952.48
2 Allison Allen shopping_net 322 35393.08
3 Allison Allen entertainment 215 21481.91
4 Allison Allen misc_net 164 18416.96
5 Allison Allen kids_pets 261 16729.34
6 Allison Allen home 282 16329.78
7 Allison Allen food_dining 219 14770.91
8 Allison Allen misc_pos 184 13889.84
9 Allison Allen personal_care 236 12840.52
10 Allison Allen grocery_net 205 11979.95
11 Allison Allen health_fitness 172 9750.77
12 Allison Allen gas_transport 107 7622.75
13 Allison Allen travel 77 467.56
14 Angela Hodges grocery_pos 291 57834.41
15 Angela Hodges shopping_pos 379 38117.37
16 Angela Hodges shopping_net 289 33616.26
17 Angela Hodges misc_net 161 19339.41
18 Angela Hodges entertainment 205 18875.18
19 Angela Hodges home 283 16761.79
20 Angela Hodges food_dining 242 14951.42
21 Angela Hodges misc_pos 180 14777.68
22 Angela Hodges kids_pets 237 13960.33
23 Angela Hodges grocery_net 198 12115.18
24 Angela Hodges personal_care 218 11887.00
25 Angela Hodges health_fitness 194 11667.44
26 Angela Hodges gas_transport 87 6015.61
27 Angela Hodges travel 68 779.40
28 Erin Chavez grocery_pos 317 64061.69
29 Erin Chavez shopping_pos 358 38819.24
30 Erin Chavez shopping_net 296 26972.00
31 Erin Chavez home 290 17974.28
32 Erin Chavez misc_pos 208 17363.37
33 Erin Chavez entertainment 193 17336.25
34 Erin Chavez kids_pets 267 16050.33
35 Erin Chavez misc_net 150 13758.21
36 Erin Chavez personal_care 235 13211.99
37 Erin Chavez food_dining 219 12886.95
38 Erin Chavez health_fitness 209 12431.46
39 Erin Chavez grocery_net 180 11151.64

Customer Intelligence Visualization

Show Query
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Top 20 Customer Intelligence',
             color='#f0ede8', fontsize=13,
             fontfamily='serif', fontweight='bold', y=1.01)

# Plot 1: Top 20 by total spend
ax1 = axes[0]
ax1.set_facecolor(surface)
top10 = top_customers.head(10)
bar_colors = [gold if i == 0 else '#3a3a42'
              for i in range(len(top10))]
bars = ax1.barh(range(len(top10)),
                top10['total_spend_usd'],
                color=bar_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(top10)))
ax1.set_yticklabels(top10['cardholder'], color=white,
                    fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Total Spend (USD)', color=muted)
ax1.set_title('Top 10 Customers by Total Spend',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
    mticker.FuncFormatter(lambda x, _: f'${x:,.0f}')
)

# Plot 2: Fraud exposure for top 20
ax2 = axes[1]
ax2.set_facecolor(surface)
fraud_exposed = top_customers[
    top_customers['fraud_incidents'] > 0
].sort_values('fraud_exposure_usd', ascending=True)

if len(fraud_exposed) > 0:
    bars2 = ax2.barh(range(len(fraud_exposed)),
                     fraud_exposed['fraud_exposure_usd'],
                     color=red, edgecolor='none', height=0.65)
    ax2.set_yticks(range(len(fraud_exposed)))
    ax2.set_yticklabels(fraud_exposed['cardholder'],
                        color=white, fontsize=9,
                        fontfamily='monospace')
    ax2.set_xlabel('Fraud Exposure (USD)', color=muted)
    ax2.xaxis.set_major_formatter(
        mticker.FuncFormatter(lambda x, _: f'${x:,.0f}')
    )
else:
    ax2.text(0.5, 0.5, 'No fraud incidents\namong top 20 customers',
             ha='center', va='center', color=gold,
             fontsize=12, fontfamily='monospace',
             transform=ax2.transAxes)

ax2.set_title('Fraud Exposure — Top Customers',
              color=white, fontweight='bold', pad=12,
              fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')

plt.tight_layout()
plt.show()

Customer Intelligence Findings and Recommendations

Recommendations:

  • Assign dedicated relationship management to the top 20 cardholders to protect portfolio revenue
  • Implement white-glove fraud alert protocols for top-tier cardholders — phone calls rather than automated blocks where possible
  • Review any fraud incidents among top customers immediately for potential reimbursement and relationship preservation

Conclusions and Regulatory Summary

Summary of Findings

Theme Key Finding Risk Classification
Transaction Intelligence Portfolio processes $91.2M across 14 categories Baseline
Fraud by Category Shopping and miscellaneous categories carry elevated fraud rates High
Demographic Risk Specific age cohorts show materially higher fraud exposure Medium
Amount Risk Fraudulent transactions have significantly higher average values High
Geographic Risk Fraud is geographically concentrated in identifiable states High
Temporal Patterns Fraud peaks at specific hours and days — enabling targeted controls Medium
Customer Intelligence Top 20 customers represent critical portfolio revenue concentration High

The Three Most Urgent Risk Actions

1. Deploy Category and Merchant-Level Monitoring Rules The fraud rate varies significantly across categories. Real-time rules targeting high-risk categories must be deployed within the current quarter.

2. Implement Amount-Based Step-Up Authentication Fraudulent transactions carry a materially higher average value than legitimate transactions. A tiered authentication framework keyed to transaction amount would intercept a disproportionate share of fraud value.

3. Protect the Top 20 Customers The portfolio’s highest-value cardholders represent concentrated revenue risk. A single unresolved fraud incident with a top customer carries relationship and reputational costs that far exceed the financial loss.

Data Quality and Analytical Constraints

  • The fraud rate of 0.58% reflects a realistic but highly imbalanced dataset, predictive modelling on this data would require resampling techniques not applied here
  • merch_zipcode was excluded due to 195,973 missing values (15.1%)
  • Geographic analysis is limited to cardholder state, merchant state data would enable cross-state fraud pattern detection
  • The dataset covers January 2019 to June 2020 only, seasonal patterns cannot be fully validated without additional years

Final Note to James Okonkwo

The portfolio data is unambiguous. NorthBridge Bank does not have a fraud volume problem, 0.58% is within industry norms. What it has is a fraud concentration problem. The fraud that exists is concentrated in identifiable categories, specific amount ranges, predictable time windows, and geographic clusters. That concentration is not a vulnerability, it is an opportunity. Every concentration identified in this report is a point where a targeted control will have maximum effect. The intelligence is here. The decisions are yours.


Report prepared by: Freda Erinmwingbovo, Data Analyst
Prepared for: James Okonkwo, CEO, NorthBridge Bank
Date: February 2026
Tools: Python · pandas · SQLite · matplotlib · Quarto
Dataset: Credit Card Transactions Dataset · 1,296,675 records · 51 states · 2019–2020
Classification: Confidential, Internal Use Only

This report was produced to professional data science standards. All findings are reproducible from the SQL queries above. No results have been overstated or adjusted to appear more favourable than the data supports.